Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Data analysis method for parallel DHP based on Hadoop

YANG Yanxia, FENG Lin

Journal of Computer Applications 2016, 36 (12): 3280-3284. DOI: 10.11772/j.issn.1001-9081.2016.12.3280

Abstract （624）

PDF （830KB）（385）

Save

It is a bottleneck of Apriori algorithm for mining association rules that the candidate set C ₂ is used to generate the frequent 2-item set L ₂. In the Direct Hashing and Pruning (DHP) algorithm, a generated Hash table H ₂ is used to delete the unused candidate item sets in C ₂ for improving the efficiency of generating L ₂. However,the traditional DHP is a serial algorithm, which cannot effectively deal with large scale data. In order to solve the problem, a DHP parallel algorithm, termed H_DHP algorithm, was proposed. First, the feasibility of parallel strategy in DHP was analyzed and proved theoretically. Then, the generation method for the Hash table H ₂ and frequent item sets L ₁, L ₃- L _k was developed in parallel based on Hadoop, and the association rules were generated by Hbase database. The simulation experimental results show that, compared with the DHP algorithm, the H_DHP algorithm has better performance in the processing efficiency of data, the size of the data set, the speedup and scalability.

Reference | Related Articles | Metrics